DOCUMENT RESUME 



ED 408 305 



TM 026 507 



AUTHOR 

TITLE 

PUB DATE 
NOTE 

PUB TYPE 
EDRS PRICE 
DESCRIPTORS 



IDENTIFIERS 



Kim, Seock“Ho 

An Evaluation of Hierarchical Bayes Estimation for the Two- 
Parameter Logistic Model. 

Mar 97 

33p. ; Paper presented at the Annual Meeting of the American 
Educational Research Association (Chicago, IL, March 1997) . 
Reports - Evaluative (142) -- Speeches/Meeting Papers (150) 
MF01/PC02 Plus Postage . 

♦Bayesian Statistics ; Difficulty Level ; *Estimation 
(Mathematics); *Item Bias; Maximum Likelihood Statistics; 
Sample Size; *Test Items 

♦Hierarchical Analysis; Item Discrimination (Tests) ; Two 
Parameter Model 



ABSTRACT 



Hierarchical Bayes procedures for the two-parameter logistic 
item response model were compared for estimating item parameters. Simulated 
data sets were analyzed using two different Bayes estimation procedures, the 
two-stage hierarchical Bayes estimation (HB2) and the marginal Bayesian with 
known hyperparameters (MB) , and marginal maximum likelihood estimation (ML) . 
Three different prior distributions were employed in the two Bayes estimation 
procedures. HB2 and MB yielded consistently smaller root mean square 
differences and mean euclidean distances than ML. The HB2 and MB estimates of 
item discrimination parameters yielded relatively larger biases than the ML 
estimates. As the sample size increased, the three estimation procedures 
yielded essentially the same bias pattern for item discrimination. Bias 
results of item difficulty show no differences among the estimation 
procedures. Tight prior conditions yielded smaller root mean square 
differences and mean euclidean distances. An appendix discusses the estimate 
of the unknown item parameters in detail. (Contains 2 figures, 4 tables, and 
45 references.) (Author/SLD) 



**************************************************************** 

♦ Reproductions supplied by EDRS are the best that can be made 

♦ from the original document. 




EDUCATIONAL RESOURCES INFORMATION 
/ CENTER (ERIC) 

HJ^his document has been reproduced as 
received from the person or organization 
originating it. 






PERMISSION TO REPRODUCE AND 
DISSEMINATE THIS MATERIAL 
HAS BEEN GRANTED BY 



□ Minor changes have been made to 
improve reproduction quality. 



Points of view or opinions stated in this 
document do not necessarily represent 
official OERI position or policy. 



TO THE EDUCATIONAL RESOURCES 
INFORMATION CENTER (ERIC) 



An Evaluation of Hierarchical Bayes Estimation for 
the Two-Parameter Logistic Model 

Seock-Ho Kim 
The University of Georgia 

March, 1997 

Running Head: HIERARCHICAL BAYES ESTIMATION 
Paper presented at the annual meeting of the American Educational 



Research Association, Chicago. 



4 



An Evaluation of Hierarchical Bayes Estimation for 
the Two-Parameter Logistic Model 



Abstract 

Hierarchical Bayes procedures for the two-parameter logistic item response model were 
compared for estimating item parameters. Simulated data sets were analyzed using two 
different Bayes estimation procedures, the two-stage hierarchical Bayes estimation (HB2) 
and the marginal Bayesian with known hyperparameters (MB), and marginal maximum 
likelihood estimation (ML). Three different prior distributions were employed in the two 
Bayes estimation procedures. HB2 and MB yielded consistently smaller root mean square 
differences and mean euclidean distances than ML. The HB2 and MB estirnates of item 
discrimination parameters yielded relatively larger biases than the ML estimates. As the 
sample size increased, the three estimation procedures yielded essentially the same bias 
pattern for item discrimination. Bias results of item difficulty show no differences among the 
estimation procedures. Tight prior conditions yielded smaller root mean square differences 
and mean euclidean distances. 

Key words: Bayes estimation, hierarchical prior, item response theory, marginal Bayesian 
estimation, maximum likelihood estimation. 
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Introduction 



Ever since Birnbaum (1969) presented Bayes methods of estimating ability parameters, 
a number of Bayesian approaches have been proposed under item response theory (IRT) 
for estimating item and ability parameters. The key feature of the Bayesian approach 
is its reliance upon simple probability theory that provides a theoretical framework for 
incorporating prior information or belief into the estimation of parameters to improve 
accuracy of estimates. 

Currently the Bayesian approaches in IRT can be distinguished by whether the estimation 
of item parameters takes place with marginalization over incidental ability parameters 
(Mislevy, 1986; Tsutakawa & Lin, 1986) or without any marginalization (Kim, Cohen, Baker, 
Subkoviak, & Leonard, 1994; Swaminathan & Gifford, 1982, 1985, 1986). The marginal 
modes may provide better approximations to the posterior means in the presence of nuisance 
parameters than the joint modes (Mislevy, 1986; O’Hagan, 1976; Tsutakawa & Lin, 1986). 
This point has been empirically demonstrated by Kim et al. (1994), especially for small data 
sets. 

Since specification of priors in Bayesian analysis is a subjective matter, a number of 
different forms of priors have been studies in estimation of item parameters. The hierarchical 
Bayes approach, suggested by Good (1980, 1983), Bindley (1971), and Bindley and Smith 
(1972), has been successfully applied to the estimation of item and ability parameters 
(Mislevy, 1986; Swaminathan & Gifford, 1982, 1985, 1986). Kim (1994) presented a two- 
stage hierarchical Bayes estimation of item parameters which involved in marginalization 
over incidental ability parameters (i.e., marginal Bayesian estimation with a two-stage 
hierarchical prior). Kim (1994) compared the item parameter estimates yielded by this two- 
stage hierarchical Bayes estimation with those obtained via maximum likelihood estimation 
and via other marginal Bayesian estimation procedures using LSAT-6 and LSAT-7 data sets 
(Bock & Lieberman, 1970). He found that the item parameter estimates yielded by the 
marginal Bayesian estimation procedures with different prior distributions were very similar. 
Parameter estimates yielded by the empirical Bayes estimation procedure for LSAT-6 and 
LSAT-7 were different from those yielded by other estimation procedures. However, these 
results were based on limited examples. It is of interest, thus, to compare the characteristics 
of item parameter estimates yielded by the two-stage hierarchical Bayes estimation with those 
obtained via marginal Bayesian estimation with diflPerent priors and via marginal maximum 
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likelihood estimation in a recovery study context. 

Complete exploitation of the potential of the Bayesian estimation requires understanding 
of its mathematical underpinnings, particularly the role of prior distributions in estimating 
parameters. In a classical Bayesian approach, a single prior can be selected for the 
ordinary parameters. It is possible to recognize some uncertainty in priors. When priors 
are expressed in terms of family or class of prior, we call the parameters in the class of 
priors as hyperparameters. Hyperparameters describe the distributional characteristics of 
the prior information. It is sometimes also convenient to specify prior information on the 
hyperparameters as well. This second prior is called a hyperprior and contains parameters 
which are referred to as hyperhyperparameters (Good, 1980, 1983; Bindley, 1971, Bindley & 
Smith, 1972). 

In this paper, we first present marginal Bayesian estimation of item parameters with a 
two-stage hierarchical prior distribution for the two-parameter logistic IRT model. Next, 
we present empirical comparisons among two Bayes estimation procedures (i.e., the two- 
stage hierarchical Bayes estimation and the marginal Bayesian estimation with known 
hyperparameters) and the marginal maximum likelihood estimation procedure. Three 
different priors were employed in the Bayes estimation procedures. It can be noted that point 
estimates of the ability parameters do not arise during the course of the marginal Bayesian 
estimation of item parameters. They are calculated after obtaining the estimates of item 
parameters, assuming the item parameters to be known (Bock & Aitkin, 1981; Mislevy & 
Bock, 1990). We do not discuss the estimation of ability parameters in this paper. 

Theoretical Framework 
IRT Model and Marginalization 

Consider binary responses to a test with n items by each of N examinees, A response 
of examinee i to item j is represented by a random variable Yij, where i = 1, . . . , W and 
j = 1, . . . , n. The probability of a correct response of examinee i to item j is represented 
by P(Yij = l|^i,^j) = Pj{0i) and the probability of an incorrect response is given by 
P{Yij = O|0i,^j) = 1 — Pj{0i) = Qj{9i), depending on a real-valued ability parameter 9j 
and a real- or vector- valued item parameter For the two-parameter logistic model, the 
probability of a correct response has the form 
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where and aj and bj are the item discrimination and difficulty parameters, 

respectively. 

For examinee i, there is an observed vector of dichotomously scored item responses of 
length n denoted by yi = {yn , . . . , yin)- Under the assumption of conditional independence, 
the probability of given 6i and the vector of all item parameters, 4 = (oi) b\,. . .,an, bn)', 
is 

p{yi\f>„() = f[Pi{D,r“QiW'~’“- ( 2 ) 

j=l 

The marginal probability of obtaining the response vector for examinee i sampled from a 
given population is 

p{yM)= [ P{yi\di,i)p{0i)d9i, (3) 

Je 

where 0 is the parameter space and p{di) is a continuous population distribution of 9i. 
Without loss of generality, we assume that di are independent and identically distributed as 
standard normal, di ~ A^(0, 1). This assumption can be relaxed as the ability distribution 
may be empirically characterized (Bock & Aitkin, 1981). The marginal probability of yi can 
be approximated with any specified degree of precision by Gaussian quadrature formulas 
(Stroud & Secrest, 1966) using 

p{yM) = IIp(yi|^fc,4)^(^fc), (4) 

fc=i 

where Xk are called the nodes and A{Xk) are the corresponding weights. Since we assume di 
are randomly sampled from A^(0, 1), we may use Gauss-Hermite quadratures, for example, 
Xk = yf^Xk and A{Xk) = A(X^)/y^, where X^ and A{Xl) are obtained from Stroud and 
Secrest (1966). 

The marginal probability of obtaining the N x n response matrix y is then given by 

p(y|4) = n p(yil^) = ^(^ly)> (5) 

i=l 

where l{^\y) niay be regarded as a function of ^ given the data y. Bayes’ theorem tells us 
that the posterior probability distribution for 4 to the data y is proportional to the product 
of the likelihood for ^ given y and the distribution for 4 prior to the data. That is, 

p«|y) = oc,«|yM«), (6) 

where oc denotes proportionality. The likelihood function represents the information about 
4 obtained from the data through which the data y may modify our prior knowledge of 4- A 
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prior distribution represents what is known about unknown parameters before the data are 
obtained. Prior knowledge or relative ignorance can be represented by such a distribution. 

Parameter Estimation in IRT 

Lord (1986) presented advantages and disadvantages of several parameter estimation 
methods in IRT. Birnbaum (1968) and Lord (1980) recommend the estimation of the 0 
and 4 by joint maximization of their likelihood function 

p(y|9,0 = nnJ’jM‘'''QjW)'-‘'‘' =f(9,«ly), P) 

i=l j=\ 

where 0 = (6i, . . . ,6 n)' . Especially, the item parameter estimation part for maximizing 
Z(4|y, 0) and the ability parameter estimation part for maximizing l{6\y,^) are iterated to 
obtain stable estimates of item and ability parameters. 

Extending the idea of joint maxirriization, Swaminathan and Gifford (1982, 1985, 1986) 
suggested that 6 and ^ can be estimated by joint maximization with respect to these 
parameters of the posterior density 

P{0,^\y) ocl{e,^\y)p{6,$,), ( 8 ) 

where p{9,^) is the joint prior density of the parameters 0 and 4- Oii the assumption 
that priors of 0 and ^ ^re independently distributed with probability density functions 
p{0) and p(4)j the item parameter estimation part which maximizes l{^\y,0)p{^) and the 
ability parameter estimation part which maximizes l{0\y, ^)p{0) are iterated to obtain stable 
estimates of item and ability parameters. 

Alternatively, Bock and Aitken (1981), Bock and Lieberman (1970), Harwell, Baker, and 
Zwarts (1988), and Tsutakawa (1984) presented estimation of ^ by maximization of the 
marginal or integrated likelihood in Equation 5. The development of marginal maximum 
likelihood estimation was motivated by the structural and incidental parameters problem 
(Baker, 1987). Assuming that the IRT model and the ability distribution are properly 
specified, the resulting item parameter estimates are consistent for tests of finite length 
(Bock & Aitkin, 1981). 

Since the marginal likelihood in Equation 5 is not a probability density function, we 
cannot make a probabilistic statement regarding We can accomplish this by analyzing 
the marginal posterior distribution in Equation 6 (e.g., Harwell & Baker, 1991; Leonard 
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& Novick, 1985; Mislevy, 1986; Tsutakawa, 1992; Tsutakawa & Lin, 1986). The posterior 
density represents a compromise between the likelihood and the prior density. Hence, an 
important element of Bayesian inference is the prior information concerning In Bayesian 
analysis it is necessary to have a convenient way to quantify such information. 

Prior and Posterior Distribution 

Prior information for parameters is expressed in terms of probability distributions in the 
Bayesian approach. It can be noted that a flexible family of prior distributions is available 
by transforming item parameters into new parameters which may be distributed as a 
multivariate normal distribution. Following Leonard and Novick (1985) and Mislevy (1986), 
we use the transformation aj = logUj. We may also write Pj = bj and Pj)'- 

We assume that the vector of item parameters ^ possesses a multivariate normal 
distribution conditional on the respective mean vector and covariance matrix The 
complete form of the hierarchical prior distribution of item parameters is given by 

=PiU\v)P2(v), (9) 

where the hyperparameter 77 = (/x^,S^), and the subscripts 1 and 2 denote the first stage 
and the second stage, respectively, of the prior distribution. 

If we assume the vectors of item parameters a and /3 are independent, we can take the 
vectors to possess independent multivariate normal distributions, conditional on their mean 
vectors, and ixp, and covariance matrices, and S/3. Then 

Pl{^\v)P2{v) =Pl{oc\Va)PMVfi)P2iVa)P2{Vp), (10) 

where 77^ = (/x^, S,^) and 77^ = (/i/3, S/3). 

When we further assume exchangeability for all parameters, we may take = /ial> 
Sq = cr^In, /i/3 = /i/jl, and S/3 = cr|I„, where /i^, cr^, /i/3, and cr| are scalars, 1 is an n x 1 
vector of ones, and is an identity niatrix of order n (Leonard & Novick, 1985). The first 
stage prior distribution can be expressed as 

n 

Pi{i\'n) = ( 11 ) 

j=i 

where 

pMi\Pa,(yl) = (27rcr2)-i/2gj^p|_^(Q,_^^)2 
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and pi{Pj\njs,a 0 ) can be similarly defined. A hierarchical Bayes approach then assigns 
another stage priors to the hyperparameter 77 . 

Hyperpriors for fj,a and can be specified by assuming that Ha has a noninformative 
uniform distribution and is distributed as where is the degrees of freedom. 

Hence, 



P2(»7a) ^ P2{(^a)P2{c^l\^a,K) = 






.2\-(t^c/2+l) 



exp -- 



Aq 



(13) 



V 2<Ji , 

where Ua and are hyperhyperparameters. Now the prior distribution for a can be 
expressed as 



1 



(27ra2) exp { - — } x 



Pl{oc\Va)P2{Va) = 

(^2)-(.„/2+l) 



j=l 



exp -■ 



Aq 

2c7^ 

a 



(14) 



r(i/„/2)(i/„A„/2)-./2 

The above equation depends upon nuisance parameters, p,a and a^, and these can be 
integrated out. Integrating out p,a and yields 



-{n+Va-\)l2 



roo roo I I 

Jo =p(a|i/a, Ac) oc > ■ (15) 

Similar specification yields p(/3|i/^, A^). As we integrated out the hyperparameter 77 , we can 
express the prior distribution of item parameters as 



p{^\v^^^) = P{oc\Ua,\a)p{^Wl3,^l3), (16) 

where the hyperhyperparameter 77 ^^^ = (i/^, A^, A^). 

In fact, the complete prior for the hierarchical model, assuming independence between 
ability and item parameters, can be written as 



p{0,T,^,-n) = p(0, r)p(^, 77 ) =Pi( 0 |r)p 2 ('r)pi(^| 77 )p 2 (» 7 ), (17) 

where pi( 0 |r) is the first stage density of 0 conditional on r, r are examinee population 
parameters which takes the second stage density P 2 {t), Pi(^|t 7 ) is the first stage density of 
^ conditional on 77 , and 77 are item population parameters which follows the second stage 
density ^ 2 (^ 7 ) (Mislevy, 1986). In this paper, we assumed r = (^g, Ug) = (0, 1) is given and 
77 = (/x^, E^) is integrated out. The prior distribution of both item and ability parameters 
given T and the hyperhyperparameter 77 ^^) can be written as 

p{ 0 \T)p{^\ri^^'>). (18) 
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The marginal posterior distribution given r and is then 

P(^|y, 'T, oc.p(y|^, T)p{^\‘n^^'>) = /(^|y, T)p{^\r)^^'>) . (19) 

Marginal Bayesian modal estimates of item parameters can be found by maximizing the 
marginal posterior distribution with respect to 4- Appendix presents a brief description of 
procedures for implementation of the marginal Bayes modal estimation with the two-stage 
hierarchical priors. 

Method 

Data were simulated under the following conditions: (1) number of examinees {N = 

100,300), (2) number of items (n = 15,45), (3) estimation (HB2, MB, ML), and (4) 
prior condition (prior-OL, prior-Ox, prior-a/^x)- The sample sizes and the test lengths were 
selected to emulate the situation in which estimation procedures and priors might have some 
impact upon item and ability parameter estimates. The sample size and the test length were 
completely crossed to yield four situations. 

Three estimation procedures were used; the two-stage hierarchical Bayes estimation 
(HB2), the marginal Bayesian with known hyperparameters (MB), and marginal maximum 
likelihood estimation (ML). The two Bayes estimation procedures, HB2 and MB, had the 
three prior conditions: prior-oiL, prior-ax, and prior-a/3x- The prior-aL condition used a 
loose prior for the transformed item discrimination; the prior-ax condition used a tight prior 
for the transformed item discrimination; and the prior-a/3x condition used tight priors for 
both the transformed item discrimination and the item difficulty. The exact specification of 
each prior condition is presented in a subsequent section on the item and ability parameter 
estimation. ML, of course, did not employ a prior distribution in estimation. 

Data Generation 

The data sets used in this study were the same as those used in Kim et al. (1994). 
Dichotomous item response vectors were generated using the two-parameter logistic model 
via the computer program GENIRV (Baker, 1982). Based on the usual ranges of 
item parameters for the two-parameter logistic model, the underlying transformed item 
discrimination parameters were assumed to be normally distributed with mean 0 and variance 
.09, aj ~ N{0, .09). The underlying item discrimination parameters aj are distributed with 
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mean 1.046 and variance .103. The underlying item difficulty parameters are distributed 
normally with mean 0 and variance 1, bj ~ N{0,1). For data generation purposes, 
an approximation based on histograms was adopted instead of selecting item parameters 
randomly from a specified distribution. Item discrimination and item difficulty parameters 
for the 15-item test were set to have three different values (the number of items is given 
in parentheses): Item discrimination parameters were .66 (4), 1 (7), and 1.51 (4), and item 
difficulty parameters were —1.38 (4), 0 (7), and 1.38 (4). For the 45-item test, each of the 
item parameters was set to have five different values: Item discrimination parameters were 
.57 (4), .76 (9), 1 (19), 1.32 (9), and 1.77 (4), and item difficulty parameters were —1.9 (4), 
—.95 (9), 0 (19), .95 (9), and 1.9 (4). There was no correlation between item discrimination 
and difficulty parameters. 

The underlying ability parameters were matched to the item difficulty distribution. 
Hence, a normal distribution with mean 0 and variance 1, 9i ~ A^(0, 1), was used to specify 
the underlying ability parameters. Also, an approximation based on histograms was adopted 
for ability and yielded 11 ability levels. For the 100-examinee sample, the ability parameter 
were set to be -2.5 (1), -2 (3), -1.5 (7), -1 (12), -.5 (17), 0 (20), .5 (17), 1 (12), 1.5 (7), 
2 (3), and 2.5 (1), where parentheses contain the number of examinees. For 300-examinee 
sample, the ability parameters were set to be —2.5 (4), —2 (8), —1.5 (20), —1 (36), —.5 (52), 
0 (60), .5 (52), 1 (36), 1.5 (20), 2 (8), and 2.5 (4). 

For each of the factors of sample size and test length, four replications of the simulated 
data were generated. Since the two factors were completely crossed, a total of 16 GENIRV 
runs was needed to obtain the data sets for the study. 



Item and Ability Parameter Estimation 

Each of the generated data sets was analyzed via the computer program BILOG (Mislevy 
&; Bock, 1990) for the MB and ML procedures and via the computer program HBAYES, 
specifically developed for this study to provide the HB2 estimates. In each Bayes estimation 
procedure, three prior conditions, prior-OL; prior-OT, and prior-o;/?T, were employed. Note 
that a prior was not employed in ML. Hence, for example, the generated item response data 
set for the first replication of sample size 100 and test length 15 was analyzed by seven 
computer runs (two Bayes estimation procedures with three prior conditions and maximum 
likelihood estimation). 

In the prior-ttL condition for MB, a lognormal prior with mean 0 and variance .25 was 
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used, that is, InOj ~ A^(0, .25). This is, in fact, the default prior specification in BILOG 
for the estimation of item parameters of the two-parameter logistic model. In the prior-ax 
condition for MB, a lognormal distribution with mean 0 and variance .09, Ina^ ~ A^(0, .09), 
was used. For the prior-a/?x condition for MB, the same prior in the prior-ax condition 
along with a normal prior was used for the item difficulty with mean 0 and variance 1, 
~ iV(0, 1). 

For HB2, the mean hyperparameter was assumed to have a noninformative uniform 
distribution and the variance hyperparameter was set to have an inverse chi-square 
distribution. In the prior-aL condition, the inverse chi-square distribution with I'a = 8 and 
Aa = .25 was used for the variance hyperparameter of the transformed item discrimination 
parameters: UaXalc^a ~ xla thus, 2/cr^ ~ The inverse chi-square distribution with 
parameters = 8 and A^ = .09 was used in the prior-ax condition: ~ Two 

inverse chi-square distributions with parameters Ua = 8 and A^ = .09, and up = 8 and 
A /3 = 1 for the variance hyperparameters of the transformed item discrimination and of the 
item difficulty, respectively, were adopted for the prior-a/?x condition: .72/cr^ ~ xl and 

~ xl 

When the mean hyperparameter is assumed to have a fixed value, /x, the specification 
of the variance hyperparameter by the inverse chi-square distribution with parameters u 
and A (i.e., uXfa^ ~ xl) yields the parameter of interest which is distributed as a t with 
mean /x, variance A, and degrees of freedom u, t{u,n,X) (Berger, 1985). Therefore, for the 
transformed item discrimination, assuming the mean hyperparameter /x^ has a fixed value, 
specification of the hyperparameter of variance by the inverse chi-square with u^ = 8 and 
Xa = .25 yields a transformed item discrimination parameter which is distributed as a t 
with mean /x^, variance A^ = .25, and degrees of freedom u^ = 8, that is, aj ~ t{8,Ha, .25). 
Similarly, the specification with Ua = 8 and A^ = .09 implies aj ~ t(8, /x^, .09); and the 
specification with up = 8 and Xp = 1 yields Pj ~ t{8,np,l). In the above illustration, 
because we assumed a noninformative prior for the mean hyperparameter, the specifications 
used in HB2 will not produce the same specifications of item hyperparameters used in MB. 
These specifications are similar to their counterparts in MB. 

Metric Transformation 

In parameter recovery studies, such as the present one, comparisons between two or more 
sets of estimates and the underlying parameters require that the item and ability estimates 
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obtained from different calibration runs and their parameters be placed on a common metric 
(Baker & Al-Karni, 1991; Yen, 1987). Parameter estimation procedures under IRT yield 
metrics which are unique up to a linear transformation. To link both sets of estimates and 
parameters, it is necessary to determine the slope and intercept of the equating coefficients 
required for the transformation. The estimates of the item and ability parameters for each 
of the estimation procedures were placed on the scale of the true parameters using the test 
characteristic curve method by Stocking and Lord (1983) as implemented in the computer 
program EQUATE (Baker, 1993). 

Criteria 

The empirical evaluation in this study involved four criteria: root mean square difference 

(RMSD), correlation, and bias, and mean euclidean distance (MED). RMSD is the square 

root of the average of the squared differences between estimated and true values. For item 

1/2 

discrimination, for example, RMSD is |(l/n) ~ 

The bias .B of a point estimator is the difference between the expected value of the 
estimates and the corresponding parameter (Mendenhall, Scheaffer, & Wackerly, 1981). The 
bias of the item discrimination estimates, for example, is given by Baj = E{aj) — aj. The 
bias was obtained with regard to the underlying parameters across the four replications. 

Since it is possible that an estimation procedure may function better at recovery of one 
type of item parameter than at recovery of the other, it is also useful to consider a single 
index which can describe simultaneously the quality of the recovery for both item parameters. 
MED provides such an index (Rudin, 1976). MED is the average of the square roots 
of the sum of the squared differences between the discrimination and difficulty parameter 
estimates and their generating values. MED is defined as (1/n) Ylj=i {(^j ~ ~ ^j)} ^ > 

where — {aj,bj)' and — {aj,bj)'. One caveat in using MED, of course, is that item 
discrimination and difficulty parameters are not expressed in comparable and interchangeable 
metrics. Even so, MED does provide a potentially useful descriptive index. 

Results 

RMSD and Correlation Results 

Item Discrimination. Average RMSDs of item discriminations over four replications are 
reported in Table 1. As sample size increased, RMSDs decreased; marginal RMSD means 
were .265 and .185 for sample sizes 100 and 300, respectively. 
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Insert Table 1 about here 



Two Bayes procedures, HB2 and MB, yielded smaller RMSDs than the ML procedure. 
For sample size 100, increasing the number of items increased the values of RMSD for HB2 
but reduced the values of RMSD for ML. Increasing the number of items reduced the size 
of RMSDs for sample size 300. When the loose prior, q;l, was used in HB2 for sample size 
100, it yielded comparatively smaller values of RMSD than did either of the tight prior 
conditions. The tight prior condition, q;t, in MB yielded smaller values of RMSD. There 
seem to be no differences among RMSDs when the tight prior conditions, and q;/3t, were 
used for sample size 300. 

The average correlations between true and estimated values of item discriminations across 
four replications are also given in Table 1. For each data set, HB2 yielded a slightly higher 
correlation than MB and ML. Generally, the larger the sample size, the higher the correlation. 
Also, increasing the number of items tended to produce higher correlations. For the three 
prior conditions used, no definitive tendency was observed in the correlations. 

Item Difficulty. Table 2 contains the average RMSDs for item difficulty over four 
replications. An increase in sample size appeared to be associated with a decrease in the 
size of RMSDs. For sample size 100, increasing the number of items appeared to slightly 
decrease RMSDs except for ML. For sample size 300, increasing the number of items 15 to 
45 resulted in larger values of RMSD. The values of RMSD from ML were consistently larger 
than the values from HB2 and MB regardless of sample sizes or test lengths. 



Insert Table 2 about here 



Prior-o;/?T condition yielded a relatively smaller RMSDs than did either prior-oiL or 
prior-oiT conditions. HB2 consistently yielded smaller RMSDs than MB across all the prior 
conditions employed. 

For each data set, all estimation procedures yielded nearly the same correlations between 
estimates and parameters (see Table 2). Generally, the larger sample sizes yielded higher 
correlations. Increasing the number of items yielded slightly higher correlations for 100- 
examinee data sets. This tendency was not observed for 300-examinee data sets. There 
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seemed to be no definitive trends in the correlations among the three prior conditions. ML 
yielded consistently lower correlations than did either HB2 or MB. It can be noticed, however, 
that all correlations were very high and close to 1. 

Bias Results 

Item Discrimination. The bias results for item discrimination, presented in Figure 1, appear 
to reflect influence by a number of factors. Each bias statistic was obtained by combining 
results from all four replications together. 



Insert Figure 1 about here 



For each test length, increasing sample size resulted in a decrease in bias values. In 
general, when Bayes estimation procedures were used, positive bias values were observed for 
the smaller item discrimination parameters (i.e., aj = .66 for the 15-item test, and aj = .57 
and .76 for the 45-item test) due to regression toward the mean of the prior distribution. 
Conversely, negative values of bias were obtained for the relatively larger item discrimination 
parameters (i.e., aj = 1.51 for the 15-itern test, and aj = 1.32 and 1.77 for the 45-item test). 
This shrinkage effect can be observed for nearly all data sets for HB2 and MB. HB2 yielded 
slightly more biased results. 

Both tight prior conditions yielded relatively more biased results. The patterns of bias 
from HB2 and MB were very similar. ML yielded different patterns of bias than did the 
HB2 and MB procedures. The differences in bias patterns between ML and the two Bayes 
procedures were very pronounced in sample size 100. The differences diminished as the 
sample size increased to 300. 

Item Difficulty. The bias results for item difficulty are reported in Figure 2. The pattern 
of results was somewhat different from that for item discrimination. For the 15-item test, 
all estimation procedures yielded nearly the same pattern of no bias. For the 45-item test, 
the three estimation procedures also resulted in nearly the same pattern of no bias for HB2, 
MB, and ML. For sample size 300, the patterns show nearly no bias results. Sample size 300 
yielded relatively more stable bias results than sample size 100. 

Insert Figure 2 about here 
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MED Results 



Average MEDs between item parameter estimates and underlying item parameters over four 
replications are reported in Table 3. HB2 and MB yielded smaller MEDs than ML. For 
sample size 300, HB2 yielded smaller MEDs than MB under prior-oiL whereas HB2 yielded 
larger MEDs than MB under both prior-oiT and prior-a/^T- For sample size 300, HB2 yielded 
consistently smaller MEDs than MB for all prior conditions. Also for sample size 300, the 
tight priors condition, q:/3t) yielded relatively smaller MEDs within each Bayes estimation. 
It can be noticed from Table 3 that MEDs decreased as the sample size increased. Increasing 
the number of items reduced the sizes of MED. 



Insert Table 3 about here 



Discussion 

Majdmum likelihood approaches in IRT suffer from a number of problems, an important 
one for the two-parameter logistic model being the possibility that outward drift of . item 
discrimination estimates occurs and, consequently, unreasonable values will be obtained for 
parameter estimates. In addition, these approaches perform poorly when estimating item 
and ability parameters for unusual response patterns such as all correct or all incorrect 
answers. These problems have led to interest in the development of Bayesian approaches for 
estimation of item and ability parameters (Baker, 1987). In the present study, we used a 
recovery study approach to compare parameter estimates for the two-parameter logistic IRT 
model obtained via the two marginal Bayesian algorithms, HB2 and MB, and the maximum 
likelihood algorithm, ML. 

Analysis of item parameter recovery results indicated that HB2 and MB yielded 
parameter estimates which were generally better than those obtained from ML. RMSD 
and MED results for item discrimination and difficulty were consistently larger for the ML 
estimates than for the HB2 and MB estimates. HB2 and MB estimates were similar although 
HB2 results were slightly better for prior-oiL and MB results were better for the tight prior 
conditions. 

When N = 300, there seems to be essentially no bias in item discrimination estimates 
yielded by either HB2 or MB. Note that under ML, except for N = 100 and n = 45 data 
sets, there found positive values of bias for large values of item discrimination (1.51 when 
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n = 15 and 1.77 when n = 45). It should be noted that no incidence of nonconvergence 
due to outward drift of discrimination estimates occurred in ML for entire data sets. The 
bias results from item difficulty were almost identical for all estimation procedures and 
prior conditions. ML of course did not employ a prior distribution. All prior conditions 
contain a prior for item discrimination. Only prior-a^x contained an additional prior for 
item difficulty. Many recovery studies indicate relatively excellent recovery results of item 
difficulty parameters. This might be a possible explanation why we have the same pattern 
of bias of item difficulty regardless of estimation procedures. 

Both the shape and the variance of the prior distribution play a part in the Bayesian 
estimation of parameters. The more informative the prior, that is, the smaller the variance, 
the more the parameter estimate tends to be pulled toward the mean of the prior. In general, 
the use of tight priors seems appropriate when there is strong a priori information about 
the parameters. In the MB context, the same prior distributions were directly imposed on 
item parameters. Without the use of the empirical Bayes (i.e., FLOAT) option, the incorrect 
specification of the prior may result in more serious consequences for MB than HB2. Mislevy 
and Stocking (1989) recommended the use of the FLOAT option in BILOG when there is a 
possibility of mismatch between the expected value of item parameters and the prior mean. 
This issue was not tested in the present study because priors were relatively well matched 
to the generated data sets. In this regard, several issues remain to be studied in the present 
context. In particular, except GiflPord and Swaminathan (1990) and Harwell and Janosky 
(1991), little has been done on the shrinkage eflPect. Neither are the eflPects of priors well 
known with respect to the robustness of the two-stage hierarchical model or other Bayes 
procedures. This kind of research is particularly valuable for small samples and short tests. 

A prior distribution represents what is known about the parameter before the data are 
obtained. Consequently the role of the prior distribution is central in Bayesian analysis. The 
prior used in the Bayes procedures in this paper assumes independence and exchangeability 
among all item parameters. Sometimes dependence between item parameters should be 
considered. In this regard, Mislevy (1986) presented multivariate normal priors to account 
for dependency within item parameters. In addition, if the exchangeability of items cannot 
be exercised, we cannot use the same prior distribution for each item. Assuming all item 
parameter estimates and the corresponding estimated variance and covariance matrices from 
previous and possibly diflPerent calibrations were placed on the same ability metric, for 
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example, on the usual ability A^(0, 1) metric, we can employ a different prior distribution for 
each item based upon existing information regarding the underlying item parameters. 

In a usual Bayesian approach, prior distributions are used for the ordinary or transformed 
item parameters. To represent prior information of item parameters in terms of the item 
response function, confidence ellipsoids suggested by Thissen and Wainer (1990) can be 
helpful. In the two-stage hierarchical approach, we specified prior information on the 
hyperparameters. An alternative approach is to use a prior distribution based on entire 
item response function rather than item parameters. Tsutakawa and Lin (1986) suggested 
the use of an ordered bivariate beta prior distribution for values of the item response function 
at two ability levels. Also Tsutakawa (1992) suggested the use of the ordered Dirichlet prior 
on the entire item response function. 

Note that the posterior density of hyperparameters may be closely approximated. For 
example, let R denote the 2n x 2n posterior information matrix (R = — H), consisting of 
appropriate second derivatives of logp(a,/3|y) and evaluated at the marginal modes. Then 
the dispersion matrix R~^ provides an approximation to the posterior covariance matrix. 
In stead of estimating the prior parameters Ha, and following Leonard (1982), 

Leonard, Hsu, and Tsui (1989), and Tierney and Kadane (1986), we can approximate the 
posterior density of these hyperparameters by the Laplacian approximation 

p{l^a, M/3, crjiy) OC p{l^a, M/3, P\y)/\^\^^^- (20) 

Two possible choices of distribution for the prior parameters are the uniform distribution 
p(Mq, M/3, <^^) ^ 1, the choice from Lindley and Smith (1972) which is 

P(Ma, M/3, Of (ctJ) gxp [-PaK/^ul - , (21) 

which takes all four hyperhyperparameters are independent, p,a and p,p each to be uniformly 
distributed over (— 00 , 00 ), and UaXa/<^a to possess chi-square distribution with 

respective degrees of freedom Va and vp. This permits the specification of prior means and 
Xp for (T“^ and based on previous distribution information, together with prior sample 
sizes for and ap. It should be noted that important sampling (Hsu, Leonard, & Tsui, 
1991) and the Gibbs sampler (Gelfand & Smith, 1990) also can be applied to this situation. 
Gomparisons among the above method and other. Bayes approaches are needed to provide 
guidelines for using Bayes methods under IRT. 
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Conclusion 

Many estimation methods have been introduced for estimating item and ability parameters 
in the context of IRT. There is still a great need for efficient algorithms of the Bayesian 
approach. In this paper, a procedure is presented for obtaining marginal Bayesian estimates 
of item parameters with a two-stage hierarchical prior distribution for dichotomously scored 
IRT models. When the procedure is applied to the simulated data, the item parameter 
estimates from HB2 are found to agree with estimates from MB. . 
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In order to estimate the unknown item parameters, the logarithm of the marginal posterior 
distribution, logp(^ |y, r, 77(2)) oclog/(^|y,r) + logp(^|77(2)) = F(^), is maximized by taking 
partial derivatives with respect to the item parameters and setting them equal to zero. The 
resulting equations represent the marginal Bayesian estimation equations for item parameters 
with two-stage priors. As is the case for the maximum likelihood or the nonlinear least 
squares estimator, we cannot generally solve the estimation equations explicitly for item 
parameters. Instead, we must solve them iteratively. The Newton-Raphson method and 
some of its modifications can be used for this purpose (Kennedy & Gentle, 1980). The 
Newton-Raphson method requires use of both the gradient vector and the Hessian matrix in 
computations: 

^(t) ^ (22) 



where 



f(t-i) ^ ^ 



L(t-l) ’ 



rO-O 



d'^F 






(t-i) ’ 



(23) 

(24) 



and t indexes the iteration. The iteration is repeated until the convergence criterion is met. 

Since the dimensionality of all terms in the Newton-Raphson equation is order of 2n, 
when the number of items is large, matrices and vectors of considerable size result. These 
are beyond the capabilities of most digital computers and ways to reduce the dimensionality 
must be found. We can accomplish this using the EM algorithm (Bock & Aitkin, 1981; 
Dempster, Laird, & Rubin, 1977). We assume that items are independent, hence, the 
estimation proceeds one item at a time. The Newton-Raphson equation becomes 






(25) 



The individual elements which are needed in the Newton-Raphson iteration for the HB2 
procedure using the Gaussian quadrature formula are given in Table A. 



Insert Table A about here 
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Note that rjk and Njk in Table A are defined as 



N 



= J2yi3Pi^k\yi,^,r) 

i=l 



and 



where 



N 



p{^k\yi,^,r) = 



Njk = J2p(^k\yu^,r), 

1=1 

nUPi{Xt)’>«Qj{Xt)'-’>‘iA{Xt) 



( 26 ) 



(27) 



(28) 



ELi n”=, PAXt)'‘“Qi{Xi,y-’>«A{Xi,y 

Based on provisional item parameter estimates obtaining fjk and Njk is the expectation 
(E) step of the EM algorithm. The maximization (M) step is to solve the Newton- Raphson 
equation for each item using obtained provisional fjk and Njk (Bock & Aitkin, 1981; Bock, 
Mislevy, & Thissen, 1991). The EM cycles are continued until we obtain a stable set of item 
parameter estimates. 

The EM solution may not provide an estimate of the posterior dispersion matrix. 
Therefore, to obtain the dispersion matrix, we need to solve the marginal Bayesian estimation 
equations after obtaining item parameter estimates from the converged EM solution. In this 
case the 2n x 2n Hessian matrix is 



H 






(29) 



di di' j ' dm' 

The summation in the Hessian matrix, however, involves all examinees and may not be 
practical to use. When we reformulate the response matrix into distinctive response patterns 
and the corresponding frequencies, the Hessian matrix may become feasible to calculate. 
For practical purpose, we can further approximate the Hessian matrix with the use of the 
empirical information matrix (Bock, Mislevy, & Thissen, 1991). We may need only one or 
two Newton-Raphson iterations to improve almost converged item parameter estimates of 
the EM solution. 
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Table 1 

Root Mean Square Differences (RMSD) and Correlation of Item Discrimination 
Averaged Over Four Replications 





Sample 


Item 


Hierarchical Bayesian-2 


Marginal Bayesian 


ML 


Prior-QL Prior-QT Prior-a/^T 


Prior-QL Prior-QT Prior-a/?T 


RMSD 


100 


15 


.225 


.251 


.246 


.255 


.227 


.238 


.412 




100 


. 45 


.233 


.276 


.276 


.255 


.231 


.238 


.348 




300 


15 


.192 


.186 


.185 


.205 


.183 


.186 


.254 




300 


45 


.161 


.159 


.160 


.181 


.159 


.161 


.216 


Correlation 


100 


15 


.673 


.673 


.691 


.667 


.671 


.644 


.657 




100 


45 


.688 


.686 


.693 


.679 


.682 


.671 


.676 




300 


15 


.820 


.824 


.823 


.819 


.823 


.818 


.815 




300 


45 


.864 


.866 


.866 


.863 


.865 


.863 


.860 
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Table 2 

Root Mean Square Differences (RMSD) and Correlation of Item Difficulty 
Averaged Over Four Replications 





Sample 


Item 


Hierarchical Bayesian-2 


Marginal Bayesian 


ML 


Prior-OL Prior-OT Prior-a/?T 


Prior-ttL Prior-ax 


Prior-a/^T 


RMSD 


100 


15 


.309 


.315 


.297 


.315 


.316 


.308 


.334 




100 


45 


.284 


.290 


.269 


.298 


.287 


.277 


.352 




300 


15 


.164 


.159 


.151 


.174 


.163 


.161 


.207 




300 


45 


.187 


.184 


.177 


.197 


.186 


.184 


.224 


Correlation 


100 


15 


.951 


.956 


.958 


.955 


.955 


.955 


.950 




100 


45 


.963 


.962 


.964 


.958 


.962 


.963 


.942 




300 


15 


.988 


.989 


.990 


.987 


.989 


.989 


.981 




300 


45 


.983 


.983 


.984 


.981 


.983 


.983 


.975 
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Table 3 

Mean Euclidean Distances (MED) Averaged Over Four Replications 



Sample 


Item 


Hierarchical Bayesian-2 


Marginal Bayesian 


ML 


Prior-aL 


Prior-aT 


Prior-a/0T 


Prior- a L 


Prior- Q !t 


Prior-Q/0T 


100 


15 


.335 


.355 


.343 


.359 


.332 


.330 


.451 


100 


45 


.320 


.342 


.331 


.344 


.322 


.317 


.423 


300 


15 


.222 


.212 


.205 


.234 


.212 


.211 


.268 


300 


45 


.214 


.211 


.209 


.230 


.213 


.212 


.262 




26 



28 



Table A 

First and Second Derivatives of the Log Posterior Distribution for HB2 



Parameter 


Contribution 


First Derivative 


Second Derivative 


aj 


Likelihood 


Q 

expiaj) - Pi) [fik - NikP^Xk)] 

k=\ 


- exp(2a,)X(^fc - Pi)^PiiXk)QiiXk)Njk 

k=l 


aj 


Prior 


1 

1 


1 2(aj-a)M 

s^\ n n + i^Q-lJ 


Pj 


Likelihood 


Q 

- exp(a^) Yl - NikPj^Xk)] 

k=\ 


9 

- exp(2a,) Y Pi(Xk)Qi(Xk)Njk 

k=l 


Pi 


Prior 


-^^Pi-P) 

^0 


1 fs^(n-l> 2(/3,-^)0 
1 n n-\- vp - \ \ 


otj Pj 


Likelihood 




Q 

exp(2a,) Yi^k - Pj)Pj{Xk)Qj{Xk)Nik 

k=l 



- q )^ + UcXa 



where = 



i=i 



n + i/a — ^ 



a = n s| = 



- Pf + upXp 



j=i 



j=l 



n-\- i/p - I 



- , and P = n ^ ^ Pj . 



j=i 
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Figure Captions 



Figure 1 . Bias plots for item discrimination. 
Figure 2. Bias plots for item difficulty. 
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